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ABSTRACT 

We propose a novel algorithm for image reconstruction in radio interferometry. The ill-posed 
inverse problem associated with the incomplete Fourier sampling identified by the visibility 
measurements is regularized by the assumption of average signal sparsity over representa- 
tions in multiple wavelet bases. The algorithm, defined in the versatile framework of convex 
optimization, is dubbed Sparsity Averaging Reweighted Analysis (SARA). We show through 
simulations that the proposed approach outperforms state-of-the-art imaging methods in the 
field, which are based on the assumption of signal sparsity in a single basis only. 
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1 INTRODUCTION 

Aperture synthesis in radio interferometry is a powerful tech- 
nique that dates bac k more than sixty years (|Rvle & Vonberd 
19461: iBlvthd Il957l: iRvle et all 119591 : iRvle & Hewisd Il960l : 
Thompson et al.l2004l) . It allows observation of the sky with oth- 
, erwise inaccessible angular resolution and sensitivity (i.e. dynamic 

■ range), providing a wealth of information for astrophysics and cos- 
' mology. The measurement equation for aperture synthesis provides 

incomplete linear information about the signal, thus defining an 
ill-posed inverse problem in the perspective of signal reconstruc- 
tion. Under restrictive assumptions of narrow-band (i.e. monochro- 

■ matic) non-polarized imaging on small fields of view, the visibili- 
' ties measured identify with Fourier measurements. Already pow- 
, erful calibration and imaging techniques have been developed in 

the field. Standard imaging algorithms, such as CLEAN and its 
multi- scale var iants iHog bom 1974; Bhatnagar & Comwell 2004; 
ICornwellll2008h , regularize the inverse problem through an implicit 
sparsity assumption of the signal in the spatial dimensions. 

The new science envisaged in astronomy in the coming 
decades requires that next-generation radio telescopes, such as the 
new LOw Frequency ARray (lofarB, or the future Extended 
Very Large Array (evlaB and Square Kilometer Array (SK^fl), 
achieve much higher dynamic range than current instruments, also 
at higher angular resolution. These telescopes will also have to 
consider wide-band (i.e. hyper- spectral) polarized imaging on wide 
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fields of view on the celestial sphere. Direction-dependent effects 
further complicate the measurement equation, and will have to be 
calibrated and accounted for in this high-dimensional imaging pro- 
cess, and calibrated. In this context, calibration and imaging tech- 
niques for radio interferometry literally need to be re-invented, thus 
triggering an intense research in the field. 

The now famous theory of compressed sensing deals with 
the rec overy of sparse signals from incomplete lin ear measure- 
ments (lDonohd[200^ : ICandesl[200^ : lBaraniukll2007h . It acknowl- 
edges the fact that natural signals often exhibit a sparse represen- 
tation in multi-scale bases. Compressed sensing proposes both op- 
timization of the acquisition technique, and non-linear iterative al- 
gorithms for signal reconstruction regularizing the ill-posed inverse 
problem through a sparsity prior. These algorithms are defined ei- 
ther in the context of convex optimization, or greedy approaches. 
It is also important to note that, beyond the pure theory of com- 
pressed sensing, these frameworks are particularly versatile and can 
account for a large variety of priors. 

The first application of compressed sensi ng to radio interfer- 
ometry was performed bv lWiaux et al ] (l2009ah . where the problem 
of image reconst ruction from i ncompl ete visibility measurements 
was considered. IWiaux etal ] (l2009ah demonstrated the versatil- 
ity of the approach and its superiority relative to standard interfer- 
ometric imaging techniques. The spread spectrum phenomenon, 
which arises by partially relaxing the small field-of-view (FOV) 
ass umption and including a first order w term, was introduced 
bv lWiaux etaD ( l2009bh as a potential optimization of the acqui- 
sition, leading to enhanced image reconstruction quality for spar- 
sity bases that are not maximally incoherent with the measurement 
basis. Furthermore, a compressed s ensing approach was devel- 
oped and evaluated by IWiaux et iD (l20ld) to recover the signal 
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induced by cosmic string s in the cosmic microwave background. 
iMcEwen & Wiauxl (l201lh ge neralise the compres sed s ensing imag- 
ing tech niques developed bv lWiaux etaD(l2009ah and lWiaux et alJ 
(l2009bh to a wide FOV, recovering interferometric images defined 
directly on the sphere, rather than a tangent plane. All of these 
works consider uniformly random and discrete visibility coverage 
in order to remain as close to the theory of compressed sensing 
as possible. First ste ps towards more realist ic visibility coverage s 
have been taken by I Su k smond |2009) and IWenger et all (l20ld) . 
who consider coverages due to s pecific in t erfero meter configura- 
tions but which remain discrete. iLi et al.l (l201lh studied a com- 
pressed sensing imaging approach based on the isotropic undec- 
imated wavelet transform, reporting results from discrete simu- 
lated coverages of ASKAP. These preliminary works suggest that 
the performance of compressed sensing reconstructions is likely to 
hold for more realistic visibility coverages. 

Convex optimization methods coupled with sparsity priors 
have proven to be a powerful framework for radio-interferometric 
imaging. Beyond the versatility that enables one to impose a wide 
range of sparsity priors, convex optimization provides significant 
improvements in the speed of the reconstruction process relative to 
state-of-the-art imaging methods in radio interferometry. This en- 
hancement in speed is crucial for the scalability of the techniques 
to very high dimensions in the perspective of next-generation tele- 
scopes. 

In the present work, we propose a novel algorithm for ra- 
dio interferometric imaging, defined in the framework of convex 
optimization, dubbed the sparsity averaging reweighted analysis 
(SARA) algorithm. The algorithm relies on the conjecture that as- 
trophysical signals are simultaneously sparse in various bases, in 
particular the Dirac basis, wavelet bases, or in their gradient, so 
that promoting average signal sparsity over multiple wavelet bases 
represents an extremely powerful prior. For comparison, we also 
study a variety of fast image reconstruction algorithms designed 
in the frameworks of convex optimization and sparse signal mod- 
elling, some of which were identified as providing similar perfor- 
mance as CLEAN and its multi- scale versions. We show, through 
realistic simulations, the superiority of SARA compared to most 
radio-interferometric imaging techniques. 

The organization of the remainder of the paper is the follow- 
ing. In Section [21 we review convex optimization methods for 
sparse inverse problems in the compressed sensing framework and 
discuss their versatility. In Section |3] we recall the inverse problem 
for image reconstruction from radio-interferometric data and con- 
cisely describe the state-of-the-art image reconstruction techniques 
used in radio astronomy. Section |4]presents the SARA algorithm. 
Numerical results of the SARA algorithm compared to state-of- 
the-art imaging techniques are presented in Section |5] Finally we 
conclude in Section[6lwith closing thoughts. 



2 COMPRESSED SENSING AND CONVEX 
OPTIMIZATION 

Convex optimization provides a powerful and versatile framework 
to solve sparse linear inverse problems such as those posed in ra- 
dio interferometry. In this section, we concisely recall the inverse 
problem for sparse signals considered in the compressed sensing 
framework and proceed further with a discussion of the versatility 
offered by convex optimization approaches. Finally, we review the 
key ideas behind the methods to solve these convex problems. 



2.1 Compressed sensing 



In the framework of compressed sensing JCa 
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) the signals probed are firstly assumed to be sparse or com- 



pressible in some basis. Consider a complex- valued signal denoted 
by the vector x G C^. An orthonormal basis ^ G C^^^ is also 
considered, in which the decomposition a G of the signal is 
defined by cc = v|/a. The signal is said to be sparse if it contains 
only K non-zero coefficients in its decomposition, where K <^ N, 
or compressible if its ordered set of coefficients decays rapidly and 
the signal can be well approximated by just the first K coefficients. 

Secondly, the signal is assumed to be probed by M linear mea- 
surements denoted by a vector y G in some sensing basis 
G C^^^ and possibly affected by independent and identically 
distributed noise n G C^. This defines an inverse problem 

1/ = 0a + n with = 0v|/ g C^""^, (1) 

where the matrix identifies the sensing basis as seen from the 
sparsity itself. Typically M < N so that the inverse problem is 
ill-posed. 

The ideal approach to recover a from ([T]) is to find the sparsest 
representation a that is consistent with the measurements, posing 
the following problem: 

min llallo subject to \\y — 0q:||2 ^ e, (2) 

where the io norm, ||q:||o, counts the number of non-zero ele- 
ments in a and e is an upper bound on the £2 norm ||n||2 of 
the residual noise, with n = y — Qci. Let us recall that the ip 
norm of a complex- valued vector a G is defined as ||a||p = 
(J2iLi where | • | represents the modulus of a complex 

number. 

The problem in ^ is combinatorial and NP-complete. The 
most common approach is to replace the io nor m by the ii norm 
and pose a convex problem t o estimate a solution jChen et al.l200ll : 
ICandesl2006l : lDonohol200"^ . In the presence of noise, the so-called 
Basis Pursuit denoise problem is the minimization of the £1 norm 
II a 111 of the coefficients of the signal in the sparsity basis under a 
constraint on the £2 norm ||n||2 of the residual noise: 

min ||a||i subject to ||i/ — 0a||2 ^ e. (3) 

The theory shows that the £0 and Basis Pursuit denoise prob- 
len is are equivalent under certain properties of the sensing matrix, 
dCandes et al.ll200^ : ICandesI [20061 . l2008h . The theory also of- 
fers various ways to design suitable sensing matrices, showing in 
particular that a small number of measurements is required relative 
to a naive Nyquist-Shannon sampling: M <^ N. Note that, in 
theory, an explici t £0 minimization would require fewer m easure- 
ments, M ^ 2K dCandes et al.ll2006l : ICandesll2006Ll2008h . 

A family of iterative greedy algorithms are also prop osed 
in the literature (Mal lat & Zhang 1993; Tropp & Gilbert 2 0071: 
iNeedell & TropDl[2008l : iBhimensath & Daviesll2009i\ These algo- 
rithms are shown to enjoy similar approximate reconstruction prop- 
erties, however, requiring more measurements for exact reconstruc- 
tion than convex optimization approaches. 

2.2 Convex optimization versatility 

While the theory of compressed sensing provides reconstruction 
guarantees for the £1 minimization problem, convex optimization 
is extremely versatile and can account for many variations, which 
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in practice can prove more effective for signal reconstruction. In 
the following we briefly describe these variations. 

Positivity: One of the main advantages of convex approaches is 
the flexibility that they provide to include prior information about 
the signal as convex constraints. In the case of image processing 
problems, where most images of interest are intensity images, the 
signals are real-valued and positive, i.e. x G M+ . This constraint 
is convex and can be easily added to the optimization problems 
without much computational load increase and without affecting 
their convergence. This constraint has proven to be very effective 
in i mproving reconstruc tion quality in radio-interf erometric imag- 
ing jWiaux et al.ll2009ah . 

Constrained vs unconstrained problems: The least squares 
regularized problem is an alternative formulation of the ba- 
sis pursuit denoising problem that recovers a sparse signal 
as the solution of an unconstrained problem formulated as: 
min^^c^ III?/ — ©CK||i + A||a||i, where A is a regularization pa- 
rameter that balances the weight between the fidelity term and the 
regularization term. It follows that determining the pro per value of 
A is a kin to determining the power limit of the noise (IChen et al.l 
l200lh . However, there is no optimal strategy to fix the regular- 
ization parameter even if the noise level is known, therefore con- 
strained problems, such as (|3}, offer a stronger fidelity term when 
the noise power is known, or can be estimated a priori. 

Orthogonal vs overcomplete representations: The techniques 
mentioned above hold for signals which are sparse in the standard 
coordinate basis or sparse with respect to some other orthonormal 
basis. However, there are numerous practical examples in which a 
signal of interest is not sparse in a single orthonormal basis but 
over several orthonorma l bases or over an overcomplete dictio- 
nary jCandes et al.ll201Qb . In the generalized setting the signal x 
is now expressed in terms of a dictionary v|/ g C^^^ {N < D) as 
X — v|/a, a e C^. Note that the problem is now severely under- 
determined since M <^ N < D, therefore requ iring greater spar- 
sity or compressibility of a. iRauhut etal.l(l2008h find conditions on 
the compound matrix 0vl; such that a can be recove r ed acc urately, 
which leads to a good estimate of x. ICandes et al.l (120101) extend 
the compressed sensing theory to redundant dictionaries, providing 
theoretical stability guarantees based on general conditions on the 
sensing matrix 0. 

Analysis vs synthesis problems: The basis pursuit denoising 
problem defines the optimization in the sparse representation do- 
main finding the optimal representation vector a and then recover- 
ing the true signal trough the synthesis relation x — U^a. These 
methods are known as synthesis based methods in the literature. 
Synthesis-based problems may also be substituted by analysis- 
based problems, where instead of estimating a sparse representa- 
tion o f the signal, the methods recover the signal itself (lElad et al.l 
l2007h . In the case of orthonormal bases, U^, the two approaches are 
equivalent. However, when v|; is a frame or an overcomplete dictio- 
nary, the two problems are no lo nger equivalent. T he geometry of 
the two problems are studied bv lElad et al.l (l2007h . who show that 
because these geometrical structures exhibit substantially different 
properties, there is a large gap between the two formulations. One 
remark to make is that the analysis problem does not increase the 
dimensionality of the problem relative to solving for the signal it- 
self, even in the case when overcomplete dictionaries are used. Em- 
pirical studies have shown very promising results for the analysis 



approach (lElad et al.ll2007l) . ICandes et all (l2010h provide a theo- 
retical analysis of the li analysis problem coupled with redundant 
dictionaries. 

Reweighted ^l vs h minimization: As discussed above, the h 
minimization problem is equivalent to ^Q minimization when the 
measurement matrix satisfies certain conditions defined in the con- 
text of compressed sensing. In general though, the key difference 
between the two problems, of course, is that ^l depends on the 
magnitudes of the coefficients of a signal, whereas minimization 
does not. To reconcile this imb alance, a reweighted minimiza- 
tion algorithm was proposed by ICandes et al.l (l2008h to mimic the 
^0 minimization behaviour. The algorithm replaces the norm in 
([3]) by a weighted li norm Ylf^i '^^ l<^* I- The idea behind this for- 
mulation is that large weights will encourage small coordinates of 
the solution vector, and small weights will encourage larger coor- 
dinates. As a motivational example suppose that the sparse signal 
a is known a priori and that we set the weights as it;^ = |ai|~^. 
In this case the weights are infinite at all locations where the sig- 
nal is zero, forcing the coordinates of the solution vector at these 
locations to be zero. This set of weights makes the weighted norm 
independent of the precise value of the non-zero components and 
guarantees to recover the correct solution assuming only K < M. 

In practice, the original signal is not known in advance but 
we can compute the appropriate weights by solving sequentially 
weighted ii problems, each using as weights essentially the in- 
verse of the values of the solution of the previous problem. Of 
course, it is not possible to have infinite weights where the esti- 
mated signal values are zero, so a stability parameter must also be 
added to the signal value in the selection of the weights. This proce- 
dure has been observed to be very effective in reducing the number 
of measurements needed for recovery, and to outperform standard 
-£i -minimizati on in many situations, see e.g. JCandes et al.ll2008l : 
lNeedellll2009l) . 

2.3 Convex optimization algorithms 

Unlike many generic optimization problems, convex optimization 
problems can be efficiently solved, both in theory (i.e., via al- 
gorit hms with worst-case poly nomial complexity) and in prac- 
tice (iMattinglev & Bov3 l2010l) . There exists a broad range of 
methods to efficiently solve convex problems, e.g. interior point 
methods, primal-dual methods and proximal splitting methods. 
Among these, proximal splitting methods offer great flexibility and 
are shown to capture and extend several well-known algorithms 
in a unifying framework. Douglas-Rachford, iterative threshold- 
ing, projected Landweber, projected gradient, alternating projec- 
tions, alternating direction method of multipliers, alternating split 
Bregman are special i nstanc es of proximal splitting algorithms 
dCombettes & Pesquej l201ll) . Such methods offer a powerful 
framework for solving convex problems in terms of speed and scal- 
ability of the techniques to very high dimensions. 

Proximal splitting methods solve optimization problems of the 

form 

min + + (4) 

where /i (cc ),..., /n(cc) are convex functions from to 
(—00,00). Note that any convex constrained problem can be for- 
mulated as an unconstrained problem by using the indicator func- 
tion of the convex constraint set as one of the functions in (|4]), i.e. 
fk{x) = ic{x) where C represents the constraint set. Also note 
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that complex- valued vectors are treated as real-valued vectors with 
twice the dimension (accounting for real and imaginary parts). A 
major difficulty that arises in solving this problem stems from the 
fact that, typically, some of the functions are not differentiable, 
which rules out conventional smooth optimization techniques. The 
key concept in proximal splitting methods is the use of the prox- 
imity operator of a convex function, which is a natural extension 
of the notion of a projection operator onto a convex set. For ex- 
ample, the proximal operator of the ^l norm is the soft-thresolding 
operator, and the proximal operator of the indicator function of a 
constraint is simply the projection operator onto the constraint set. 
Proximal splitting methods proceed by splitting the contribution of 
the functions /i , . . . , /n individually so as to yield an easily imple- 
mentable algorithm. They are called proximal because each non- 
smooth function in ^ is involved via its proximity operator. In 
essence, the solution to ^ is reached iteratively by successive ap- 
plic ation of the proximity opera tor associated with each function. 
See lCombettes & Pesquetl (l201lh for a review of proximal splitting 
methods and their applications in signal and image processing. 

One remark to make is that there also exist proximal splitting 
algorithms that offer a parallel implementation structure where all 
the proximity operators can be computed in parallel rather than se- 
quentially. Examples of these algorithms are the proximal paral- 
lel algorithm and the simultane ous-direction method of multipli- 
ers (ICombettes & Pesque3l201lh . Such a structure is useful when 
implementing the algorithms in multicore architectures, thus pro- 
viding a significant gain in terms of speed. 



3 RADIO INTERFEROMETRIC IMAGING 

In this section we recall the general form of the visibility mea- 
surements and also pose the corresponding interferometric inverse 
problem for image reconstruction under small FOV considerations. 
We also review the state-of-the-art imaging algorithms in radio in- 
terferometry. 

3.1 Visibilities 

In order to image a region of the sky, all radio telescopes of an in- 
terferometric array point in the same direction sq G on the unit 
celestial sphere. We consider a Cartesian coordinate system centred 
on the earth aligned with the pointing direction. At each instant 
of observation, each telescope pair measures a complex visibility 
defined as the correlation between incoming electric fields at the 
positions of the two telescopes. This visibility only depends on the 
relative position between the two telescopes, defined as a baseline. 
We consider a monochromatic signal x, and made up of incoher- 
ent sources. Also, we consider non-polarized radiation and a small 
FOV such that the signal on the celestial sphere is well approxi- 
mated by its projection on to plane orthogonal to sq. In this con- 
text, each visibility corresponds to the measurement of the Fourier 
transform of a planar signal at the spatial frequency u — (u^v) 
where (i^, v) identifies the baseline components in the image plane, 
and in units of the observation wav elength. This result is known 
as the van Cittert-Zemike theorem dThompson etal.ll2004 . The 
measured visibility reads as: 

y[u) = j A{l)x{l)e-^'^^-U\ (5) 

where I — (l^m) denotes the coordinates on the image plane and 
A (Z) is the so-called primary beam, which limits the observed 



FOV. The total number of points u probed by all telescope pairs 
of the array during the observation provides some incomplete cov- 
erage in the Fourier plane characterizing the interferometer. 

3.2 Inverse problem in matrix form 

In a practical setting we want to represent the map x by a dis- 
cretized image. The band-limited functions considered are com- 
pletely identified by their Nyquist-Shannon sampling on a discrete 
uniform grid ofA^ = A^^/^xA^^/^ points U ^M? m real space 
with 1 ^ i ^ N and by their corresponding discrete spatial fre- 
quencies Ui. The sampled intensity signal and primary beam are 
denoted b y the vectors ac, A G respectively. 

As in lWiaux et we assume that the spatial frequen- 

cies u probed by all telescope pairs during the observation belong 
to the discrete uniform grid of points Ui, thus bypassing gridding 
considerations for the sake of simplicity. The Fourier coverage pro- 
vided by the M spatial frequencies probed can simply be identified 
by a binary mask in the Fourier plane equal to 1 for each spa- 
tial frequency probed and otherwise. The visibilities measured 
may be denoted by a vector of M complex Fourier coefficients 
y G = {yh = y(ub)}i^b^M, possibly affected by com- 

plex noise of astrophysical or instrumental origin, identified by the 
vector n G C^. Since the signal x is real-valued, we could only 
take measurements in half of the plane and infer the measurements 
of the other half through conjugate relations. 

In this discrete setting, the Fourier coverage is in general in- 
complete in the sense that the number of real constraints 2M is 
smaller than the number of unknowns N; complete coverage of 
the Fourier plane corresponds to M = A^/2. An ill-posed inverse 
problem is thus defined for the reconstruction of the signal x from 
the measured visibilities y \ 

y = (t>x-\-n with = MFA, (6) 

where the matrix G C^^^ identifies the complete linear rela- 
tion between the signal and the visibilities. The matrix A G R^^^ 
is the diagonal matrix implementing the primary beam. The unitary 
matrix F G C^^^ implements the discrete Fourier transform pro- 
viding the Fourier coefficients. The matrix M G R^^^ is the 
rectangular binary matrix implementing the mask characterizing 
the interferometer. The inverse transform of the binary mask, i.e. 
F^M^Im with 1m G defining the vector of ones, identifies 
the dirty beam and the inverse transform of the Fourier measure- 
ments with all non-observed visibilities set to zero, i.e. F^M^y, is 
the dirty image. 

For signal reconstruction, a regularization scheme that encom- 
passes enough prior information on the original signal is needed in 
order to find a unique solution. All image reconstruction algorithms 
will differ through the kind of regularization considered. 

3.3 State-of-the-art imaging algorithms 

The most standard and otherwise already very effective image 
reconstruction algorithm from visibility measurements is called 
CLEAN, which is a non-linear deconvolution method ba s ed on 
local iterative beam removal (iHogbomI Il974l : ISchwarzl Il978l: 
[Thompson et al.ll2004l) . A sparsity prior on the original signal in 
real space is implicitly introduced. Multi- scale versions of CLEAN 
(MS-CLEAN) have also been developed (Com well 2008), where 
the sparsity is improved by multi- scale decomposition, hence en- 
abling better recovery of the signal. The MS-CLEAN method was 
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shown to perform better than the standard CLEAN, but still suffers 
from an empirical choice of basis profiles and scales. An adaptive 
scale pixel decomposition method called ASP-CLEAN was also in- 
troduced to improve on multi- scale CLEAN by rel ying on an adap- 
tive choice of scales (iBhatnagar & Comwellll2004l) . Note that these 
approaches are known to be slow, sometimes prohibitively. An- 
other approach to the reconstruction of images from visibility mea- 
surements is the Maximum Entropy Method (MEM). In contrast to 
CLEAN, MEM solves a global optimization problem in which the 
inverse problem is regularized by the introduction of an e ntropic 
prior on the signal, but sparsity is not explicitly req uired (lAblesI 
ll974l : lGull & Daniellll 19781 : IComwell & Evanslll985h . In practice, 
CLEAN is often preferred to MEM. 

Reconstruction techniques based on convex optimization and 
sparse models have also been proposed. Approaches based on 
^l minimization coupled with the Dirac basis have been previ- 
ously stu died in the field iMarsh & RichardsonI 19871 : IWiaux et"aD 
l2009al lbl: iMcEwen & Wiauxl 1201 ll : iLi et all 1201 ih . The equiva- 
lence between CLEAN and minimization has b een st udied in 
( Marsh & RichardsonI [T987h . IWiaux et all (l2009ah and iLi et all 
( 201ll) report that minimization yields similar reconstruction 
quality to CLEAN, while including a positivity constraint in a con- 
vex formulation significantly enhances the reconstruction quality 
relative to CLEAN. 

Extended structures do not have an optimal sparse represen- 
tation in Dirac basis. Thus, wavelet bases have also been consid- 
ered in order to provide a sparser representation. Synthesi s-based 
approaches with red unda nt representatio n s have been propose d by 
IWiaux et al.l (l2009ah and lLi et all (1201 ih . IWiaux et al.l ( l2009ah use 
a reweighted appro ach coupled wi th a steerable wavelet frame 
as sparsity dictionary. iLi et al.l (l201ll) use a least squares regu- 
larized problem with the isotropic undecimated wavelet transform 
(lUWT) as sparsity dictionary. The reconstruction quality of the 
lUWT method was reported to be superior than those of CLEAN 
and MS-CLEAN. 

Many signals in nature are also sparse or compressible in the 
magnitude of their gradient space, in which case the TV minimiza- 
tion problem has been shown to yield superior reconstruction re- 
sults (iRudin etal.ll 19921) . The TV norm is defined by the ii norm 
of the magnitude of the gradient of the signal II^IItv — [ | V^||i , 
where Vx denotes the gradient magnitude (lRudinetal.lll992h . 
From this formulation, it can be seen that the TV problem might 
be modelled as an analysis minimization problem with the dis- 
crete gradient operator as the sparsity inducing transform. TV 
minim ization was a l ready prop osed for radio-interferomet ric imag- 
ing bv lWiaux et aP (|201Q|) a ndlMcEwe n & Wiauxl (1201 ll) showing 
promising results. Moreover JWiaux et a l. (2010) used a reweighted 
TV minimization approach to recover the signal induced by cosmic 
strings in the cosmic microwave background. 



4 SPARSITY AVERAGING REWEIGHTED ANALYSIS 

In this section we propose a novel algorithm for radio- 
interferometric imaging based on the prior of average signal spar- 
sity over multiple wavelet bases. We start by discussing our con- 
jecture of average signal sparsity. Then, we propose the reweighted 
^l analysis method as a means to promote average sparsity. Finally, 
we present the resulting algorithm. 



4.1 Sparsity average conjecture 

As already discussed in the previous sections, while point and com- 
pact sources have a sparse representation in the Dirac basis, piece- 
wise smooth structures exhibit gradient sparsity, and continuous 
extended structures are better encapsulated in wavelet bases. As- 
tronomical images are often complex and all these types of struc- 
tures can be present at once. Therefore, we here conjecture that 
promoting average sparsity or compressibility over multiple bases 
rather single bases represents an extremely powerful prior. Note on 
a theoretical level that a single signal cannot be arbitrarily sparse 
simultaneously in any pair of b ases, due to the incoherence be- 
tween these bases fsee lWiaux et a l. ( 2009a) and references therein 
for a definition of incoherence). For illustration, a signal extremely 
sparse in the Dirac basis is completely spread in the Fourier ba- 
sis. We hypothesize that, for any pair of bases, there might exist 
a lower bound on the average sparsity of a signal, which identifies 
a generalized "uncertainty relation". In essence, our prior consists 
of assuming that the signals of interest are those that saturate this 
uncertainty relation between multiple pairs of bases. 

We propose using a dictionary composed of a concatenation 
of orthonormal bases, i.e. 



(7) 



thus G M ^ with D = qN. Given the previous considerations 
on astronomical images, the dictionary should be composed of the 
Dirac basis and wavelet bases. In particular, the Haar wavelet basis 
can be used as an alternative to gradient sparsity (usually imposed 
by a TV prior) to promote piecewise smooth signal^ See Sec- 
tion lS.ll for further details on a practical selection of these bases. 



4.2 Reweighted ii analysis problem 

In the light of our previous discussions on the versatility of con- 
vex optimization, we promote this average sparsity through a 
reweighted ii analysis method. Let us define the weighted ii prob- 
lem 

min \\\N^^x\\i (8) 

subject to \\y — <t>x\\2 ^ e 
and X ^ 0, 

where W G ^ ^ is a diagonal matrix with positive weights and 
the constraint x ^ represents the positivity prior on x. Assum- 
ing i.i.d. complex Gaussian noise with variance Cn-, the £2 norm 
term in (jSj is identical to a bound on the distribution with 2M 
degrees of freedom governing the noise level estimator. Therefore, 
we set this bound as = (2M + 4:y/M)an/2, where is 
the variance of both the real and imaginary part of the noise. This 
choice provides a likely bound for ||n||2, since the probability that 
||n||l exceeds is the probability that a with 2M degrees of 
freedom exceeds its mean, 2M, by at least two times the standard 
deviation 2V^M, which is very small. The solution to (jS]) is de- 
noted as A(i/, 0, W, e), which is a function of the data vector y, 
the measurement and weight matrices and W, and the bound e 
on the noise level estimator. 

Recall that in the reweighting approach a sequence of 



^ In fact, the compr essibility of Haar w avelet coefficients is controlled by 
the image TV norm dPeVore et alll 19921) . 
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weighted problems is solved, each using as weights essentially 
the inverse of the values of the solution of the previous problem. In 
practice, we update the weights at each iteration, i.e. after solving 
a complete weighted problem, by the function 



7 



7+|x| 



(9) 



where 7 plays the role of a stabilization parameter (ideally zero). 
Note that as 7 ^ the weighted ^l norm approaches the ^Q 
norm. To approximate the Iq non n by the reweighted algo- 
rithm, we use a homotopy strategy (iNocedal & Wrightll2006h and 
solve a sequence of weighted li problems using a decreasing se- 
quence {7*^*^}, with t denoting the iteration time variable. Under 
this scheme, a weighted ^l problem is first solved and its solution is 
used as the warm start initialization for the next problem that is ge- 
ometrically closer to the ^Q problem. This process is then repeated 
until the solution becomes stationary (iNocedal & Wright.2006) . 



4.3 The SARA algorithm 

The resulting algorithm, dubbed sparsity averaging reweighted 
analysis (SARA), is defined in Algorithm [T] A rate parameter /3, 
with < /3 < 1, controls the decrease of the sequence 7*^*-* = 
P^^^-^^ = /3*7o such that 7^*^ ^ Oast ^ 00. Ideally, 7^*^ should 
decrease to zero, but since we have noise, we set a lower bound as 
7*^*^ ^ cTc. The standard deviation of the noise in the representation 
domain is computed as cfc — y^M/qNan, which gives a rough 
estimate for a baseline above which significant signal components 
could be identified. As a starting point we set x^^^ as the solution 
of the £1 problem and 7*^°-* = as (^^^x^^''^, where as(-) stands 
for the empirical standard deviation of the signal, fixing the signal 
scale. The reweighting process ideally stops when the relative vari- 
ation between successive solutions Hx*^*-* — ||2/||x*^*~"^^ II2 is 
smaller than some bound 77, with < 77 < 1, or after the maximum 
number of iterations allowed, A^max, is reached. In our implemen- 
tation, which will be detailed in Section l5Jl we fix 77 = 10 ~^ and 



Algorithm 1 SARA algorithm for RI imaging 
Input: y, 0, e, ctc, /3, 77 and A^max. 
Output: Reconstructed image x. 
1: 
2: 



Initialize t 1, W^^^ = I and p = 1. 
Compute 

= A(i/,0,W(°),e), 

while p > 77 and t < A^max do 
Update the weight matrix 



W 



1) ^(*- 



^3 f 

Compute a solution 



SijJoriJ = 1,...,D 



Update 7*^*-* = max(/37 



Update p = 
t ^ t + 1 

end while 

return x 



(t-i) 



2/\\x 



5 SIMULATIONS AND RESULTS 

In this section we evaluate the performance of the SARA algorithm 
through numerical simulations. Firstly, we describe the practical 
implementation of SARA and state-of-the-art algorithms used as 
benchmarks. Secondly, we describe the simulation set up in the 
context of the inverse problem associated with (|6}. Thirdly, we 
report the results of the comparison of SARA to the state-of-the- 
art. Finally, we present an illustrative example of the performance 
of SARA in the presence of the spread spectrum phenomenon. 



5.1 SARA implementation and benchmarck algorithms 

For all the experiments we consider a concatenation of nine bases 
{q = 9), thus v|/ G M^""^ with D = 9A^, as the dictionary for 
SARA. The first basis is the Dirac basis. The eight rem aining bases 
are th e first eight Daubechies wavelets, Dbl-Db8 (jPaubechiesI 
The first Daubechies wavelet basis, Dbl, is the Haar wavelet 
basis. We use a fourth order decomposition depth for all wavelet 
base^fl- 

We compare SARA to state-of-the-art algorithms for li and 
TV minimization problems, as well as their reweighted versions, 
in terms of reconstruction quality and computation time. Firstly, 
the reweighted li problems considered are defined through the 
reweighting procedure described in Section lOl based on (|8], with 
specific choices of the sparsity dictionary ^ . We consider three 
different options for the Dirac basis, the Daubechies 8 wavelet 
basis and the isotropic undecimated wavelet redundant dictio- 
nary. The associated methods are respectively denoted R-BP, R- 
BPDbS and R-BPIU. The (non-reweighted) ^1 problems are de- 
fined through ^ with W = I and again different choices of the 
sparsity dictionary vU. We here consider four different options for 
the Dirac basis, the Daubechies 8 wavelet basis, the isotropic 
undecimated wavelet dictionary and the concatenation of 9 bases 
described above for SARA. The associated methods are respec- 
tively denoted BP, BPDb8, BPIU and BPSA. 

Secondly, the TV minimization problem is formulated as: 



mm Uc TV 



(10) 



subject to \\y — <t>x\\2 ^ e 
and X ^ 0. 

We have also implemented a reweighted version of TV (still 
through the procedure defined in Section |4]2), denoted as R-TV. Fi- 
na lly, we also use as benchmark the synthesis-based lUWT method 
of lLi etal.1 (l201lh and we denote it as lUWT. In the light of the 
discussion in Section [331 we assume that the reconstruction qual- 
ity provided by BP is essentially equivalent to that of the standard 
CLEAN algorithm, and that the reconstruction quality provided by 
lUWT is an upper bound on the reconstruction quality of any multi- 
scale implementation of CLEAN. 

To solve ([8]) and fTol ), we use the Douglas-Rachford splitting 
algorithm, which is tailored to solve problems of the form Q for 
n — 2 and with the additional property of not requiring di fferentia- 
bility of any of the functions jCombettes & Pesque3l2007l) . 



^ Experimental results have shown that the performance of SARA degrades 
if one of the bases is withdrawn from the dictionary. We do not present these 
results in detail here for the sake of space. 
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Figure 1. (color online). Test images with brightness values in the interval [0.01, 1] shown in a logio scale. Left: M31. Right: 30Dor. 




I: 



Ml 



Figure 2. (color online). Example of a simulated visibility coverage. Left panel: Variable density sampling pattern for 30% of coverage of the Fourier plane. 
The mask is symmetrized for visualization purposes: each pair of symmetric points represents one measured visibility. The real part of the corresponding dirty 
images for M31 and 30Dor are shown in the middle and right panels, respectively, in a logio scale. 



5.2 Simulations 

We evaluate the reconstruction performance of SARA by recover- 
ing well-known test images from simulated incomplete visibilities 
following the model in (|6]) with A = I. The test images used in 
all simulations are based on a HII region in M3 1 and the 30 Do- 
radus (30Dor) in the Large Magellanic Cloud. We use discrete 
models of size 256x256 as ground truth imagefl The test im- 
ages with brightness values in the interval [0.01, 1] are shown in 
Figure[T]in a logio scale. We consider incomplete visibility cover- 
ages generated by random variable density sampling profiles. Such 
profiles are characterized by denser sampling at low spatial fre- 
quencies than at high frequencies. This choice allows one to take 
into account the fact that most of the signal energy is usually con- 
centrated around low frequencies, also mimic king common gen eric 
sampling patterns in radio interferometry (see lPuv et al.l(l201lh for 
the exact shape of the density profile). In order to make the sim- 
ulated coverages more realistic we suppress the (0, 0) component 
of the Fourier plane from the measured visibilities. This generic 
profile approach allows us to evaluate the reconstruction quality 
for arbitrary percentages of visibility coverage and without concern 
for various telescope configurations. Let us recall that, accounting 
for image reality, we only take measurements in the half Fourier 
plane. A complete coverage of the half plane is referred to as a 
100% coverage. The left panel in Figure |2] shows an example of a 
sampling pattern for 30% of coverage of the Fourier plane (sym- 
metrized mask for visualization purposes). The middle and right 
panels in Figure |2] show the real part of the dirty images gener- 

^ Available at^ http : //casaguides . nrao . edu/ index . php| 



ated by this sampling pattern for M31 and 30Dor respectively. We 
evaluate numerically the reconstruction quality and computational 
speed of the algorithms considered for coverages between 10% and 
90%. 

We use as reconstruction quality metric the signal to noise ra- 
tio (SNR), which is defined as: 

SNR = 20 logio , (11) 

where Gx and (Tx-x denote the standard deviation of the original 
image and the standard deviation of the error image respectively. 
The visibilities are corrupted by complex Gaussian noise with a 
fixed input SNR of 30 dB, with the input SNR defined as in (TTl) 
with (Tx-x substituted by the standard deviation of the noise on 
each visibility. 

5.3 Results 

The left panels in Figure [3] and Figure]?] show, for M31 and 30Dor 
respectively, the SNR results against percentage of coverage for 
BP, BPDbS, TV, lUWT and SARA. Average values over 100 sim- 
ulations and associated one standard deviation error bars are re- 
ported. The results demonstrate that SARA outperforms state-of- 
the-art methods for all coverages. Moreover, the results for M31 
show considerable enhancement provided by SARA, with a gain of 
more than 6 dB for 10% of coverage and at least 3 dB for the rest 
of the coverages relative to other methods. The results for 30Dor, 
which is a more complicated image with both extended structures 
and compact structures, show a SNR improvement of SARA of at 
least 2 dB over all other methods. These results confirm the con- 
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Coverage percentage Coverage percentage 

Figure 3. (color online). Reconstruction results for M31. Left: Average reconstruction SNR against percentage of coverage. Right: Average computation 
time. Vertical bars identify one standard deviation errors around the mean over 100 simulations. The input SNR is set to 30 dB. 
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Figure 4. (color online). Reconstruction results for 30Dor. Left: Average reconstruction SNR against percentage of coverage. Right: Average computation 
time. Vertical bars identify one standard deviation errors around the mean over 100 simulations. The input SNR is set to 30 dB. 



jecture that average sparsity over multiple orthonormal bases rep- 
resents a stronger prior than sparsity over a single representation. 

It was found that the reweighting process never enhances the 
results for the benchmark algorithms significantly. R-BP provides 
worse results than BP for both test images achieving a SNR at least 
3 dB below BR R-TV does not show any improvement over TV for 
30Dor. For M31, R-TV reconstructions exhibit a gain of at most 1 
dB for coverages above 70%, and lower SNRs than TV for cover- 
ages below 70%. The reconstruction quality of R-BPDb8 is worse 
than that obtained by BPDbS for 30Dor. For M31, R-BPDb8 pro- 
vides a SNR improvement of at most 1 dB over BPDbS for cover- 
ages above 50% and lower SNRs than BPDbS for coverages below 
50%. The analysis-based BPIU and R-BPIU did not show any im- 
provement with respect to lUWT for 30Dor. For M31, BPIU and 
R-BPIU provide a gain of at most 1 dB and 3 dB, respectively, 
compared to lUWT. Also, the reconstruction quality of BPSA is al- 
ways worse than that achieved by SARA, being at least 3 dB below. 
Therefore, results for R-BP, R-BPDbS, R-TV, R-BPIU, BPIU and 
BPSA are not shown in Figure [3] and Figure |4] 



Computation times (on a 2.4 GHz Xeon quad core) are re- 
ported in the right panels of Figure |3] and Figure |4] for M3 1 and 
30Dor respectively, in a logio scale, for the same algorithms as 
those considered in the left panels. Again, average values over 
100 simulations and associated one standard deviation error bars 
are reported. Even though the concatenation of multiple bases and 
the reweighting process render the algorithm structure more costly, 
we see that the computation times are of the order of minutes for 
SARA, and very similar to those required for TV n iinimization and 
those reported for MS -CLEAN in the literature (IComwelllbOOSi : 
iLi et al.|[201lh . Note that all the preliminary implementations for 
these experiments are made in MATLAB. Therefore, significantly 
faster implementations can be achieved using a lower level pro- 
gramming language with custom optimized code. Also, the ver- 
satile framework of convex optimization offers a lot of room for 
improvement in terms of computational speed and efficiency. The 
Douglas-Rachford algorithm provides nice properties but, as em- 
phasised in Section |2] other proximal splitting methods exist that 
offer a parallel implementation structure, such as the proximal par- 



© 2012 RAS, MNRAS 000,[TJ{T2] 



SARA for radio -int erf erometric imaging 9 



allel algorithm and the simultaneous-direction method of multipli- 
ers, where all the proximit y operators can be computed in paral- 
lel rather than sequentially (ICombettes & Pesque3l201lh . Further- 
more, the simultaneous-direction method of multipliers offers a dis- 
tributed implementation structure. 

Next we present a visual assessment of the reconstruction 
quality of SARA compared to the benchmark algorithms. Figure [5] 
and Figure[6]show the results from M31 and 30Dor respectively for 
a coverage of 30%. The results are shown from top to bottom for 
BP, BPDbS, lUWT, TV and SARA respectively. The first column 
shows the reconstructed images in a logio scale, the second column 
shows the error images, defined as cc — x, in linear scale, and, the 
third column shows the real part of the residual images, defined as 
the difference between dirty images and dirty images constructed 
from recovered images, i.e. r — F^M^y — F^M^Ox, also in 
linear scale. We use the residual images as a visual quality mea- 
sure because it is commonly used in radio interferometry. In a few 
words, beyond a significant SNR increase, SARA provides an im- 
pressive reduction of visual artifacts relative to the other methods. 

More specifically, for M31 we can see that BP and BPDbS 
yield a good reconstruction of the inner structures but also give a 
lot of artifacts in the constant background, with BP having a lot of 
point-like errors as expected. TV also yields a good reconstruc- 
tion quality, since the original image has well defined edges, even 
though TV suffers from bias problems and is not capable of esti- 
mating the correct background having a slight shift in the brightness 
value. The lUWT method yields a nearly flat residual map. How- 
ever, this does not necessarily translate into a better reconstruction 
quality as can be observed in the error image. This highlights the 
fact that the common criterion of flatness of residual image is not an 
optimal measure of reconstruction fidelity. SARA yields a restored 
image with few artifacts in the background and small errors in the 
inner structures, showing the advantage of multiple basis represen- 
tations. 

For 30Dor we see that BP and lUWT do not yield good results, 
with reconstructions having a lot of spurious point-like structures. 
BPDbS yields a good reconstruction but also yields a lot of visual 
artifacts. TV achieves a fair reconstruction of the original image. 
However, the resulting image has a piecewise structure (leading to 
a cartoon-like visual effect) due to the TV prior. SARA yields the 
best recovery of the original image, being able to recover point- 
like structures as well as continuous extended structures. Note that 
all methods yield noise-like residual maps for this example but the 
actual reconstruction quality differs for all methods. 

5.4 Spread spectrum illustration 

In this subsection we present an illustrative example of the per- 
formance of SARA in the presence of the spread spectrum phe- 
nomenon. Recall that the spread spectrum phenomenon arises by 
partially relaxing the small FOV as sumption and includi ng a first 
order w term. It was introduced by IWiaux et aP (l2009bh as a po- 
tential optimization of the acquisition, leading to enhanced image 
reconstruction quality for sparsity bases that are not maximally in- 
coherent with the measurement basis. Spread spectrum incorpo- 
rates a modulating sequence in the measurement operator redefin- 
ing it as = MFAC, where C is a diagonal matrix with diagonal 
elements with unit norm. For the sake of simplicity we consider 
sequen ces with ran dom phase instead of quadratic phase as consid- 
ered bv lWiaux et al. (2009b). For our ill ustration we use Cygnus A 
as a test image dCarilU & Barthdlll996h for 30% coverage and 30 
dB of input SNR. We compare SARA against BP, BPDbS, BPIU, 



and TV. The SNR of the recovered image for each algorithm is 
as follows: BP (16.6 dB), BPDbS (36.0 dB), BPIU (29.9 dB), TV 
(2S.7 dB) and SARA (40.2 dB). The superior reconstruction quality 
of SARA is again clear. In Figure |7] we show reconstructed images 
for SARA and BPDbS. 



6 CONCLUDING REMARKS 

In this paper we have proposed a novel algorithm for image re- 
construction in radio interferometry dubbed Sparsity Averaging 
Reweighted Analysis (SARA). The algorithm relies on the conjec- 
ture that astrophysical signals are simultaneously sparse in multiple 
bases, in particular the Dirac basis, wavelet bases, or in their gradi- 
ent, so that promoting average signal sparsity over multiple wavelet 
bases represents an extremely powerful prior. Experimental results 
demonstrate that SARA outperforms state-of-the-art imaging meth- 
ods in the field, all based on the assumption of signal sparsity in a 
single basis or signal gradient sparsity. 

In future work we plan to focus on extending the current al- 
gorithm to handle continuous visibilities. In this respect, a stable 
version of the algorithm must be implemented in a low level pro- 
gramming language. Also, the final evolution should take advan- 
tage of proximal splitting algorithms with parallel and distributed 
structures allowing implementation in multicore architectures or in 
computer clusters. Such approaches are crucial for the scalability 
of the proposed algorithm to very high dimensions when dealing 
with continuous visibilities. 



ACKNOWLEDGMENTS 

We thank Pierre Vandergheynst, Jean-Philippe Thiran and Dimitri 
Van De Ville for providing the infrastructure to support our re- 
search. REG is supported by the Swiss National Science Foun- 
dation (SNSF) under grant 200021-130359. JDM is supported by 
a Newton International Fellowship from the Royal Society and the 
British Academy and, during this work, was also supported by a 
Leverhulme Early Career Fellowship from the Leverhulme Trust. 
YW is supported in part by the Center for Biomedical Imaging 
(CIBM) of the Geneva and Lausanne Universities, EPFL and the 
Leenaards and Louis-Jeantet foundations, and in part by the SNSF 
under grant PP00P2-12343S. 



References 

Abies, J.G., 1974, A&AS, 15, 3S3 
Baraniuk R., 2007, IEEE Signal Process. Mag., 24, llS 
Bhatnagar S., Comwell T. J., 2004, A&AS, 426, 747 
Blythe J. H., 1957, MNRAS, 117, 644 

Blumensath T., Davies M., 2009, App. Gomp. Harmonic Anal., 
27, 265 

Gandes E. J., 2006, in Sanz-Sole M., Soria J., Varona J. L., 
Verdera J., eds, Proc. Int. Congress Math. Vol. 3. Euro. Math. 
Soc. Zurich, p. 1433 

Gandes E. J., 200S, G. R. Acad. Sci., 346, 5S9 

Gandes E. J., Romberg J., Tao T., 2006, IEEE Trans. Inf. Theory, 
52, 4S9 

Gandes E. J., Wakin M., Boyd S., 200S, J. Fourier Anal. Appl., 
14, S77 

Gandes E. J., Eldar Y., Needell D., Randall P, 2010, App. Gomp. 
Harmonic Anal., 31, 59 



© 2012 RAS, MNRAS 000,[TJtT2l 



10 Carrillo et al. 




SARA for radio -int erf erometric imaging 1 1 








I 



- - -c 

I; 
I 

- - 0.1 

- - 

I 

I 



- - -c 




I 

I 
I 

I 

I 




I 

I 



I 

I 

I 



Figure 6. (color online). Reconstruction example of 30Dor. The results are shown from top to bottom for BP (SNR=16.67 dB), BPDbS (SNR=24.53 dB), 
lUWT (SNR=17.87 dB), TV (SNR=26.47 dB) and SARA (SNR=29.08 dB) respectively. The first column shows the reconstructed images in a logio scale, 
the second column shows the error images in linear scale, and the third column shows the residual images also in linear scale. 
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Figure 7. (color online). Reconstruction example of Cygnus A with spread spectrum acquisition for 30% coverage and 30 dB of input SNR. The results are 
shown from top to bottom for BPDbS (SNR=36.0 dB) and SARA (SNR=40.2 dB). The first column shows the reconstructed images in a logio ^^^^^ ^^^e 
second column shows the error images in linear scale. 
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